Selection from Structured Data Sets
نویسندگان
چکیده
A large body of work studies the complexity of selecting the j-th largest element in an arbitrary set of n elements (a.k.a. the select(j) operation). In this work, we study the complexity of select in data that is partially structured by an initial preprocessing stage and in a data structure that is dynamically maintained. We provide lower and upper bounds in the comparison based model. For preprocessing, we show that making at most α(n) · n comparisons during preprocessing (before the rank j is provided) implies that select(j) must make at least (2 + )(n/e2) comparisons in the worst case, where > 2. For dynamically maintained data structures, we show that if the amortized number of comparisons executed with each insert operation is bounded by i(n), then select(j) must make at least (2 + )(n/e2) comparisons in the worst case, no matter how costly the other data structure operations are. When only insert is used, we provide a lower bound on the complexity of findmedian. This lower bound is much higher than the complexity of maintaining the minimum, thus formalizing the intuitive difference between findmin and findmedian. Finally, we present a new explicit adversary for comparison based algorithms and use it to show adversary lower bounds for selection problems. We demonstrate the power of this adversary by improving the best known lower bound for the findany operation in a data structure and by slightly improving the best adversary lower bound for sorting.
منابع مشابه
Rough sets theory in site selection decision making for water reservoirs
Rough Sets theory is a mathematical approach for analysis of a vague description of objects presented by a well-known mathematician, Pawlak (1982, 1991). This paper explores the use of Rough Sets theory in site location investigation of buried concrete water reservoirs. Making an appropriate decision in site location can always avoid unnecessary expensive costs which is very important in constr...
متن کاملFeature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کاملSelection of Variables that Influence Drug Injection in Prison: Comparison of Methods with Multiple Imputed Data Sets
Background: Prisoners, compared to the general population, are at greater risk of infection. Drug injection is the main route of HIV transmission, in particular in Iran. What would be of interest is to determine variables that govern drug injection among prisoners. However, one of the issues that challenge model building is incomplete national data sets. In this paper, we addressed the process ...
متن کاملSpatial Design for Knot Selection in Knot-Based Low-Rank Models
Analysis of large geostatistical data sets, usually, entail the expensive matrix computations. This problem creates challenges in implementing statistical inferences of traditional Bayesian models. In addition,researchers often face with multiple spatial data sets with complex spatial dependence structures that their analysis is difficult. This is a problem for MCMC sampling algorith...
متن کاملEstimating Selection Coefficients in Spatially Structured Populations from Time Series Data of Allele Frequencies
Inferring the nature and magnitude of selection is an important problem in many biological contexts. Typically when estimating a selection coefficient for an allele, it is assumed that samples are drawn from a panmictic population and that selection acts uniformly across the population. However, these assumptions are rarely satisfied. Natural populations are almost always structured, and select...
متن کاملDiagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets
With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Electronic Colloquium on Computational Complexity (ECCC)
دوره شماره
صفحات -
تاریخ انتشار 2004